
	
	**The purpose of this .do file is to add county information to detainer data**
	
		

			**	**	**	**	**	**	**	**	**	**	**	**	**
			**	**	**	**	**	**	**	**	**	**	**	**	**
			**	**	**	**	**	**	**	**	**	**	**	**	**
	

*-------------------------------------------------------------------------------
* Start with TRAC spreadsheet
*-------------------------------------------------------------------------------

import excel "facilities by state.xlsx", ///
sheet("Sheet1") firstrow clear



**
*keep only relevant variables
**
renvars, lower
keep state facility


**
*split facility
**

split facility

**
*rename and keep
**

rename facility1 detentionfacilitycode

keep state detentionfacilitycode

drop if de == ""

duplicates drop

**
*save for merge
**

save "statefacmerge.dta", replace


*-------------------------------------------------------------------------------
* Start with all detainers data
*-------------------------------------------------------------------------------

use "detainersallyears.dta", clear


*-------------------------------------------------------------------------------
* now keep only geography related variables
*-------------------------------------------------------------------------------

keep areaofresponsibility site detentionfacilitycode detentionfacility


*-------------------------------------------------------------------------------
* now remove duplicates
*-------------------------------------------------------------------------------

duplicates drop detentionfacilitycode detentionfacility, force

*-------------------------------------------------------------------------------
* merge in states -- note that this leaves 725 unidentified
*-------------------------------------------------------------------------------

merge 1:1 detentionfacilitycode using "statefacmerge.dta"

drop if _m == 2

drop _m


*-------------------------------------------------------------------------------
* Extract as many county names as possible from facility names
*-------------------------------------------------------------------------------

gen detentionfacilitylower = lower(detentionfacility)

split detentionfacilitylower


**
*add together county names
**

*work backward
gen county = detentionfacilitylower1 if detentionfacilitylower2=="county"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 ///
if detentionfacilitylower3=="county"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
if detentionfacilitylower4=="county"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
+ " " + detentionfacilitylower4 if detentionfacilitylower5=="county"

**unnecessary but in case
replace county = detentionfacilitylower1 + detentionfacilitylower2 + detentionfacilitylower3 ///
+ detentionfacilitylower4 + detentionfacilitylower5 + detentionfacilitylower6 if detentionfacilitylower6=="county"
replace county = detentionfacilitylower1 + detentionfacilitylower2 + detentionfacilitylower3 ///
+ detentionfacilitylower4 + detentionfacilitylower5 + detentionfacilitylower6 + detentionfacilitylower7 ///
if detentionfacilitylower7=="county"

**now same process for abbreviation "co."
replace county = detentionfacilitylower1 if detentionfacilitylower2=="co."
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 ///
if detentionfacilitylower3=="co."
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
if detentionfacilitylower4=="co."
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
+ " " + detentionfacilitylower4 if detentionfacilitylower5=="co."

**now same process for abbreviation "co"
replace county = detentionfacilitylower1 if detentionfacilitylower2=="co"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 ///
if detentionfacilitylower3=="co"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
if detentionfacilitylower4=="co"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
+ " " + detentionfacilitylower4 if detentionfacilitylower5=="co"

**now same process for "parish"
replace county = detentionfacilitylower1 if detentionfacilitylower2=="parish"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 ///
if detentionfacilitylower3=="parish"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
if detentionfacilitylower4=="parish"
replace county = detentionfacilitylower1 + " " + detentionfacilitylower2 + " " + detentionfacilitylower3 ///
+ " " + detentionfacilitylower4 if detentionfacilitylower5=="parish"

**
*We now have counties for about half of all facilities
**

drop detentionfacilitylower1-detentionfacilitylower8

***
*Find states in the last two letters of facility codes
***

**first take last two letters and make them a state abbreviation variable
gen stateab = substr(detentionfacilitycode, -2, .)

**identify which of these are actually state abbreviations
statastates, ab(stateab)

*clean up
replace stateab = "" if state_name == ""
drop _m
replace state_name = lower(state_name)

*Make state lower case
replace state = lower(state)

**now look at and resolve conflicts ; "state" is always correct where there is a conflict
br detentionfacilitycode detentionfacility state state_name county if ///
state != "" & state_name != "" & state != state_name

**now look at all the new state names that aren't in conflict and make manual corrections where needed
br detentionfacilitycode detentionfacility state_name county if ///
state == "" & state_name != "" 

**
*Manual corrections
**

replace state_name = "texas" if detentionfacilitycode == "TXSJHAL"

replace state_name = "maine" if detentionfacilitycode == "MEYTHCT"

replace state_name = "new york" if detentionfacilitycode == "NYWENDC"

replace state_name = "florida" if detentionfacilitycode == "FLGLSDC"

replace state_name = "michigan" if detentionfacilitycode == "MIDETDC"

replace state_name = "louisiana" if detentionfacilitycode == "LADWADE"

replace county = "claiborne" if detentionfacilitycode == "LADWADE"

replace county = "" if detentionfacilitycode == "5COJVID"

replace state_name = "arizona" if detentionfacilitycode == "AZSPWIN"

replace state_name = "indiana" if detentionfacilitycode == "INWOPRI"

**
*Combine state info; 215 still missing
**

replace state = state_name if state == ""
count if state == ""
assert r(N) == 215

**
*Deal with remaining duplicates.
**

*note that sometimes the same facility name has more than one code

*we can still delete 6 duplicates because we care about county-wide policies here

sort state county detentionfacilitylower detentionfacilitycode
by state county detentionfacilitylower: drop if _n > 1

count
assert r(N) == 4610


**
*Create id
**

sort state county detentionfacilitylo
gen iceid = _n



**
*save for reclink fuzzy matching
**

save "detainerstates", replace


